Submitted to Eurospeech’99, Budapest SPEECH/MUSIC DISCRIMINATION BASED ON POSTERIOR PROBABILITY FEATURES
نویسندگان
چکیده
A hybrid connectionist-HMM speech recognizer uses a neural network acoustic classifier. This network estimates the posterior probability that the acoustic feature vectors at the current time step should be labelled as each of around 50 phone classes. We sought to exploit informal observations of the distinctions in this posterior domain between nonspeech audio and speech segments well-modeled by the network. We describe four statistics that successfully capture these differences, and which can be combined to make a reliable speech/nonspeech categorization that is closely related to the likely performance of the speech recognizer. We test these features on a database of speech/music examples, and our results match the previously-reported classification error, based on a variety of special-purpose features, of 1.4% for 2.5 second segments. We also show that recognizing segments ordered according to their resemblance to clean speech can result in an error rate close to the ideal minimum over all such subsetting strategies.
منابع مشابه
Speech/music discrimination based on posterior probability features
A hybrid connectionist-HMM speech recognizer uses a neural network acoustic classifier. This network estimates the posterior probability that the acoustic feature vectors at the current time step should be labelled as each of around 50 phone classes. We sought to exploit informal observations of the distinctions in this posterior domain between nonspeech audio and speech segments well-modeled b...
متن کاملA Sphinx Based Speech-music Segmentation Front-end for Improving the Performance of an Automatic Speech Recognition System in Turkish
In this study a system that segments an audio signal as speech and music by using posterior probability based features is proposed and implemented in Sphinx. Unlike the earlier efforts that uses Multi-Layer Perceptrons (MLP), this system uses Hidden-MarkovModel based acoustic models that are trained in Sphinx for posterior probability calculations. Acoustic Models are trained with the HMM-state...
متن کاملExperiments on Speech/Music Discrimination
The problem of speech/music discrimination has become increasingly important as automatic speech recognition system are applied to more real-world multimedia domains. One of the issue in the design of a signal classifier is the selection of an appropriate feature set that captures the temporal and spectral structures of the signal. Many features have been used in speech/music discrimination. Th...
متن کاملSubmitted to Eurospeech’99, Budapest MULTI-STREAM SPEECH RECOGNITION: READY FOR PRIME TIME?
Multi-stream and multi-band methods can improve the accuracy of speech recognition systems without overly increasing the complexity. However, they cannot be applied blindly. In this paper, we review our experience applying multi-stream and multiband methods to the Broadcast News corpus. We found that multi-stream systems using different acoustic front-ends provide a significant improvement over...
متن کاملFeature fusion for music detection
Automatic discrimination between music, speech and noise has grown in importance as a research topic over recent years. The need to classify audio into categories such as music or speech is an important part of the multimedia document retrieval problem. This paper extends work previously carried out by the authors which compared performance of static and transitional features based on cepstra, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999